大数据生态系统 修仙之道 Azkaban Blog

2019-04-22 Docs Language:简体中文 & English Programing Language:Azkaban Website:www.geekparkhub.com OpenSource GitHub repo size in bytes GeekDeveloper:JEEP-711 Github:github.com/geekparkhub Gitee:gitee.com/geekparkhub

🐘 Azkaban Technology 修仙之道 动静兼修 🐘

Alt text


1. Azkaban 概述

1.1 工作流调度系统

enter image description here

1.2 Azkaban 应用场景

1.3 Azkaban 简介

1.4 Azkaban 特点

1.5 常见工作流调度系统

1.6 Oozie与Azkaban特性对比

特性 Oozie Azkaban
工作流描述语言 Xml text file with key/value pairs
是否要Web容器 Yes Yes
进度跟踪 Web Page Web Page
Hadoop Job调度支持 Yes Yes
运行模式 daemon daemon
事件通知 No Yes
需要安装 Yes Yes
兼容Hadoop版本 0.20+ currently unknown
重试支持 work flow node evel Yes
运行任意指令 Yes Yes

1.7 Azkaban 架构

enter image description here

2. Azkaban 部署

2.1 Azkaban Official Download

azkaban-executor-server-2.5.0.tar.gz
azkaban-sql-script-2.5.0.tar.gz
azkaban-web-server-2.5.0.tar
mysql-libs.zip

2.2 部署

[root@systemhub711 module]# mkdir azkaban
[root@systemhub711 software]# tar -zxvf azkaban-executor-server-2.5.0.tar.gz -C /opt/module/azkaban/
[root@systemhub711 software]# tar -zxvf azkaban-web-server-2.5.0.tar.gz -C /opt/module/azkaban/
[root@systemhub711 software]# tar -zxvf azkaban-sql-script-2.5.0.tar.gz -C /opt/module/azkaban/
[root@systemhub711 module]# cd azkaban/
[root@systemhub711 azkaban]# mv azkaban-2.5.0 azkaban
[root@systemhub711 azkaban]# mv azkaban-executor-2.5.0 azkaban-executor
[root@systemhub711 azkaban]# mv azkaban-web-2.5.0 azkaban-web
[root@systemhub711 software]# mysql -uroot -p
Enter password:
Welcome to the MySQL monitor. Commands end with ; or \g.
mysql> create database azkaban;
Query OK, 1 row affected (0.01 sec)
mysql> use azkaban;
Database changed
mysql> source /opt/module/azkaban/azkaban/create-all-sql-2.5.0.sql
Query OK, 0 rows affected (0.08 sec)
mysql> show tables;
+------------------------+
| Tables_in_azkaban |
+------------------------+
| active_executing_flows |
| active_sla |
| execution_flows |
| execution_jobs |
| execution_logs |
| project_events |
| project_files |
| project_flows |
| project_permissions |
| project_properties |
| project_versions |
| projects |
| properties |
| schedules |
| triggers |
+------------------------+
15 rows in set (0.00 sec)
mysql>

2.3 生成密钥库

2.3.1 密钥参数说明

[root@systemhub711 azkaban]# keytool -keystore keystore -alias core_flow -genkey -keyalg RSA
Enter keystore password:
Re-enter new password:
What is your first and last name?
[Unknown]:
What is the name of your organizational unit?
[Unknown]:
What is the name of your organization?
[Unknown]:
What is the name of your City or Locality?
[Unknown]:
What is the name of your State or Province?
[Unknown]:
What is the two-letter country code for this unit?
[Unknown]:
Is CN=Unknown, OU=Unknown, O=Unknown, L=Unknown, ST=Unknown, C=Unknown correct?
[no]: y
Enter key password for <core_flow>
(RETURN if same as keystore password):
Re-enter new password:
[root@systemhub711 azkaban]#
[root@systemhub711 azkaban]# mv keystore /opt/module/azkaban/azkaban-web

2.4 时间同步

[root@systemhub711 azkaban-web]# ll /usr/share/zoneinfo/Asia/Shanghai
-rw-r--r--. 3 root root 405 Oct 16 2013 /usr/share/zoneinfo/Asia/Shanghai
[root@systemhub711 azkaban-web]#

2.5 配置文件

2.5.1 WebServer 配置

[root@systemhub711 azkaban]# cd azkaban-web/conf/
[root@systemhub711 conf]# ll
total 8
-rw-r--r-- 1 root root 1022 Apr 22 2014 azkaban.properties
-rw-r--r-- 1 root root 266 Apr 22 2014 azkaban-users.xml
[root@systemhub711 conf]# vim azkaban.properties
#Azkaban Personalization Settings
# Web 显示名称
azkaban.name=System Flow
# Web 描述
azkaban.label=Azkaban Core Flow
azkaban.color=#FF3601
azkaban.default.servlet.path=/index
# 默认WebServer存放web文件目录
web.resource.dir=/opt/module/azkaban/azkaban-web/web/
# 默认时区,默认值为美国,已配置为亚洲/上海
default.timezone.id=Asia/Shanghai
#Azkaban UserManager class
user.manager.class=azkaban.user.XmlUserManager
# 用户权限管理默认类(绝对路径)
user.manager.xml.file=/opt/module/azkaban/azkaban-web/conf/azkaban-users.xml
#Loader for projects
executor.global.properties=/opt/module/azkaban/azkaban-executor/conf/global.properties
azkaban.project.dir=projects
database.type=mysql
mysql.port=3306
mysql.host=systemhub711
mysql.database=azkaban
mysql.user=root
mysql.password=ax0pix
mysql.numconnections=100
# Velocity dev mode
velocity.dev.mode=false
# Azkaban Jetty server properties.
jetty.maxThreads=25
jetty.ssl.port=8443
jetty.port=8081
# SSL 文件名 / 绝对路径
jetty.keystore=/opt/module/azkaban/azkaban-web/keystore
# SSL 文件密码
jetty.password=ax0pix
# SSL 与主密码相同
jetty.keypassword=ax0pix
# SSL 文件名 / 绝对路径
jetty.truststore=/opt/module/azkaban/azkaban-web/keystore
# SSL 文件密码
jetty.trustpassword=ax0pix
# Azkaban Executor settings
executor.port=12321
# mail settings
mail.sender=
mail.host=
job.failure.email=
job.success.email=
lockdown.create.projects=false
cache.directory=cache
[root@systemhub711 conf]# vim azkaban-users.xml
<azkaban-users>
<user username="azkaban" password="azkaban" roles="admin" groups="azkaban" />
<user username="metrics" password="metrics" roles="metrics"/>
<user username="admin" password="admin" roles="admin,metrics"/>
<role name="admin" permissions="ADMIN" />
<role name="metrics" permissions="METRICS"/>
</azkaban-users>

2.5.2 ExecutorServer 配置

#Azkaban
default.timezone.id=Asia/Shanghai
# Azkaban JobTypes Plugins
azkaban.jobtype.plugin.dir=plugins/jobtypes
#Loader for projects
executor.global.properties=/opt/module/azkaban/azkaban-executor/conf/global.properties
azkaban.project.dir=projects
database.type=mysql
mysql.port=3306
mysql.host=systemhub711
mysql.database=azkaban
mysql.user=root
mysql.password=ax0pix
mysql.numconnections=100
# Azkaban Executor settings
executor.maxThreads=50
executor.port=12321
executor.flow.threads=30

2.6 启动 Executor Server

[root@systemhub711 azkaban-executor]# pwd
/opt/module/azkaban/azkaban-executor
[root@systemhub711 azkaban-executor]# bin/azkaban-executor-start.sh

2.7 启动 Web Server

[root@systemhub711 azkaban-web]# pwd
/opt/module/azkaban/azkaban-web
[root@systemhub711 azkaban-web]# bin/azkaban-web-start.sh
[root@systemhub711 ~]# jps
29392 AzkabanWebServer
28037 AzkabanExecutorServer
30040 Jps
[root@systemhub711 ~]#

enter image description here

3. Azkaban 任务调度

3.1 单一Job 任务调度

[root@systemhub711 ~]# cd /opt/module/azkaban/
[root@systemhub711 azkaban]# mkdir core_jobs
[root@systemhub711 core_jobs]# vim single_workflow.job
# first job
type=command
command=echo 'This is a Single Jobs'
[root@systemhub711 core_jobs]# zip single_workflow.zip single_workflow.job
adding: single_workflow.job (deflated 8%)
[root@systemhub711 core_jobs]# ll
total 8
-rw-r--r-- 1 root root 61 May 10 20:54 single_workflow.job
-rw-r--r-- 1 root root 244 May 10 20:57 single_workflow.zip
[root@systemhub711 core_jobs]#

enter image description here

enter image description here

enter image description here

Job Logs
CST single_workflow INFO - Starting job single_workflow at 1557494992670
CST single_workflow INFO - Building command job executor.
CST single_workflow INFO - 1 commands to execute.
CST single_workflow INFO - Command: echo 'This is a Single Jobs'
CST single_workflow INFO - Environment variables: {JOB_OUTPUT_PROP_FILE=/opt/module/azkaban/azkaban-executor/executions/1/single_workflow_output_6225901896375074204_tmp, JOB_PROP_FILE=/opt/module/azkaban/azkaban-executor/executions/1/single_workflow_props_9124304639650346490_tmp, JOB_NAME=single_workflow}
CST single_workflow INFO - Working directory: /opt/module/azkaban/azkaban-executor/executions/1
CST single_workflow INFO - This is a Single Jobs
CST single_workflow INFO - Process completed successfully in 0 seconds.
CST single_workflow INFO - Finishing job single_workflow at 1557494992835 with status SUCCEEDED

3.2 多Job工作流 任务调度

[root@systemhub711 core_jobs]# hadoop fs -mkdir -p /core_flow/azkaban/multitasking
[root@systemhub711 core_jobs]#
# Multitasking HSFD WorkFlow
type=command
command=/opt/module/hadoop/bin/hadoop fs -put /opt/module/datas/test.txt /core_flow/azkaban/multitasking
use default;
create external table multitasking_hive_workflow(
id int ,
name string)
row format delimited fields terminated by '\t'
location '/core_flow/azkaban/multitasking';
# Multitasking Hive WorkFlow
type=command
command=/opt/module/hive/bin/hive -f /opt/module/azkaban/core_jobs/multitasking_hive_workflow.sql
dependencies=multitasking_hdfs_workflow
# Multitasking Select WorkFlow
type=command
command=/opt/module/hive/bin/hive -e 'select * from multitasking_hive_workflow;'
dependencies=multitasking_hive_workflow
[root@systemhub711 core_jobs]# zip multitasking_workflow.zip multitasking_hdfs_workflow.job multitasking_hive_workflow.job multitasking_select_workflow.job
adding: multitasking_hdfs_workflow.job (deflated 28%)
adding: multitasking_hive_workflow.job (deflated 35%)
adding: multitasking_select_workflow.job (deflated 33%)
[root@systemhub711 core_jobs]#

enter image description here

CST multitasking_select_workflow INFO - multitasking_hive_workflow.id multitasking_hive_workflow.name
CST multitasking_select_workflow INFO - 1 TestUser001
CST multitasking_select_workflow INFO - 2 TestUser002
CST multitasking_select_workflow INFO - 3 TestUser003
CST multitasking_select_workflow INFO - 4 TestUser004

3.3 Java 任务调度

package com.geekparkhub.core.azkaban.api.producehub;
import java.io.FileOutputStream;
import java.io.IOException;
/**
* Geek International Park | 极客国际公园
* GeekParkHub | 极客实验室
* Website | https://www.geekparkhub.com/
* Description | Open开放 · Creation创想 | OpenSource开放成就梦想 GeekParkHub共建前所未见
* HackerParkHub | 黑客公园枢纽
* Website | https://www.hackerparkhub.com/
* Description | 以无所畏惧的探索精神 开创未知技术与对技术的崇拜
* GeekDeveloper : JEEP-711
*
* @author system
* <p>
* AzkabanWorkFlow
* <p>
*/
public class AzkabanWorkFlow {
/**
* Core Method
*
* @throws IOException
*/
public void rum() throws IOException {
FileOutputStream fileOutputStream = new FileOutputStream("/opt/module/azkaban/output.txt");
fileOutputStream.write("This is a Java Task".getBytes());
fileOutputStream.close();
}
public static void main(String[] args) throws IOException {
AzkabanWorkFlow workFlow = new AzkabanWorkFlow();
workFlow.rum();
}
}
[root@systemhub711 module]# cd azkaban/
[root@systemhub711 azkaban]# ll
total 40
drwxr-xr-x 2 root root 4096 May 10 16:50 azkaban
drwxr-xr-x 10 root root 4096 May 10 19:43 azkaban-executor
drwxr-xr-x 9 root root 4096 May 10 19:43 azkaban-web
-rw-r--r-- 1 root root 21518 May 10 22:02 AzkabanWorkFlow.jar
drwxr-xr-x 2 root root 4096 May 10 22:10 core_jobs
[root@systemhub711 azkaban]#
[root@systemhub711 azkaban]# cd core_jobs/
[root@systemhub711 core_jobs]# vim java_workflow.job
# java work flow
type=javaprocess
java.class=com.geekparkhub.core.azkaban.api.producehub.AzkabanWorkFlow
classpath=/opt/module/azkaban/AzkabanWorkFlow.jar
[root@systemhub711 core_jobs]# zip java_workflow.zip java_workflow.job
adding: java_workflow.job (deflated 25%)
[root@systemhub711 core_jobs]#
[root@systemhub711 azkaban]# ll
total 44
drwxr-xr-x 2 root root 4096 May 10 16:50 azkaban
drwxr-xr-x 10 root root 4096 May 10 19:43 azkaban-executor
drwxr-xr-x 9 root root 4096 May 10 19:43 azkaban-web
-rw-r--r-- 1 root root 21518 May 10 22:02 AzkabanWorkFlow.jar
drwxr-xr-x 2 root root 4096 May 10 22:23 core_jobs
-rw-r--r-- 1 root root 19 May 10 22:29 output.txt
[root@systemhub711 azkaban]# cat output.txt
This is a Java Task
[root@systemhub711 azkaban]#

3.4 HDFS 任务调度

[root@systemhub711 core_jobs]# vim hdfs_workflow.job
# hdfs work flow
type=command
command=/opt/module/hadoop/bin/hadoop fs -mkdir /core_flow/azkaban
[root@systemhub711 core_jobs]# zip hdfs_workflow.zip hdfs_workflow.job
adding: hdfs_workflow.job (deflated 14%)
[root@systemhub711 core_jobs]#
[root@systemhub711 azkaban]# hadoop fs -ls /core_flow/
drwxr-xr-x - root supergroup /core_flow/azkaban
drwxr-xr-x - root supergroup /core_flow/develop
drwxr-xr-x - root supergroup /core_flow/test
[root@systemhub711 azkaban]#

3.5 MapReduce 任务调度

[root@systemhub711 core_jobs]# vim mapreduce_workflow.job
# mapreduce work flow
type=command
command=/opt/module/hadoop/bin/hadoop jar /opt/module/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /user/geekparkhub/input /core_flow/azkaban/output
[root@systemhub711 core_jobs]# zip mapreduce_workflow.zip mapreduce_workflow.job
adding: mapreduce_workflow.job (deflated 34%)
[root@systemhub711 core_jobs]#
[root@systemhub711 core_jobs]# hadoop fs -cat /core_flow/azkaban/output/*
geek 2
geekparkhub 1
hackerparkhub 5
hadoop 3
helloworld 2
test 1
[root@systemhub711 core_jobs]#

3.6 Hive脚本 任务调度

use default;
create table hive_workflow(
id int ,
name string
)row format delimited fields terminated
by
"\t";
load data local inpath "/opt/module/datas/test.txt"
into
table hive_workflow;
insert overwrite
directory '/core_flow/azkaban/hive_workflow' row format delimited fields terminated
by
'\t' select
*
from
hive_workflow;
# hive work flow
type=command
command=/opt/module/hive/bin/hive -f /opt/module/azkaban/core_jobs/hive_workflow.sql
[root@systemhub711 core_jobs]# zip hive_workflow.zip hive_workflow.job
adding: hive_workflow.job (deflated 22%)
[root@systemhub711 core_jobs]#
[root@systemhub711 core_jobs]# hadoop fs -cat /core_flow/azkaban/hive_workflow/*
1 TestUser001
2 TestUser002
3 TestUser003
4 TestUser004
[root@systemhub711 core_jobs]#

4. 修仙之道 技术架构迭代 登峰造极之势

Alt text


💡如何对该开源文档进行贡献💡

  1. Blog内容大多是手敲,所以难免会有笔误,你可以帮我找错别字。

  2. 很多知识点我可能没有涉及到,所以你可以对其他知识点进行补充。

  3. 现有的知识点难免存在不完善或者错误,所以你可以对已有知识点的修改/补充。

  4. 💡欢迎贡献各领域开源野生Blog&笔记&文章&片段&分享&创想&OpenSource Project&Code&Code Review

  5. 🙈🙈🙈🙈🙈🙈🙈🙈🙈🙈🙈 issues: geekparkhub.github.io/issues 🙈🙈🙈🙈🙈🙈🙈🙈🙈🙈🙈

希望每一篇文章都能够对读者们提供帮助与提升,这乃是每一位笔者的初衷


💌感谢您的阅读 欢迎您的留言与建议💌

捐助 项目的发展离不开你的支持,请开发者喝杯☕Coffee☕吧!

enter image description here

致谢

捐助时请备注 UserName

ID UserName Donation Money Consume
1 Object WeChatPay 5RMB 一杯可乐
2 泰迪熊看月亮 AliPay 20RMB 一杯咖啡
3 修仙道长 WeChatPay 10RMB 两杯可乐

License 开源协议

Apache License Version 2.0